Search CORE

27 research outputs found

MetaGCD: Learning to Continually Learn in Generalized Category Discovery

Author: Chi Zhixiang
Feng Songhe
Wang Yang
Wu Yanan
Publication venue
Publication date: 21/08/2023
Field of study

In this paper, we consider a real-world scenario where a model that is trained on pre-defined classes continually encounters unlabeled data that contains both known and novel classes. The goal is to continually discover novel classes while maintaining the performance in known classes. We name the setting Continual Generalized Category Discovery (C-GCD). Existing methods for novel class discovery cannot directly handle the C-GCD setting due to some unrealistic assumptions, such as the unlabeled data only containing novel classes. Furthermore, they fail to discover novel classes in a continual fashion. In this work, we lift all these assumptions and propose an approach, called MetaGCD, to learn how to incrementally discover with less forgetting. Our proposed method uses a meta-learning framework and leverages the offline labeled data to simulate the testing incremental learning process. A meta-objective is defined to revolve around two conflicting learning objectives to achieve novel class discovery without forgetting. Furthermore, a soft neighborhood-based contrastive network is proposed to discriminate uncorrelated images while attracting correlated images. We build strong baselines and conduct extensive experiments on three widely used benchmarks to demonstrate the superiority of our method.Comment: This paper has been accepted by ICCV202

arXiv.org e-Print Archive

Domain Adaptive Attention Model for Unsupervised Cross-Domain Person Re-Identification

Author: Huang Yangru
Peng Peixi
Jin Yi
Xing Junliang
Lang Congyan
Feng Songhe
Publication venue
Publication date: 12/06/1967
Field of study

Person re-identification (Re-ID) across multiple datasets is a challenging yet important task due to the possibly large distinctions between different datasets and the lack of training samples in practical applications. This work proposes a novel unsupervised domain adaption framework which transfers discriminative representations from the labeled source domain (dataset) to the unlabeled target domain (dataset). We propose to formulate the domain adaption task as an one-class classification problem with a novel domain similarity loss. Given the feature map of any image from a backbone network, a novel domain adaptive attention model (DAAM) first automatically learns to separate the feature map of an image to a domain-shared feature (DSH) map and a domain-specific feature (DSP) map simultaneously. Specially, the residual attention mechanism is designed to model DSP feature map for avoiding negative transfer. Then, a DSH branch and a DSP branch are introduced to learn DSH and DSP feature maps respectively. To reduce domain divergence caused by that the source and target datasets are collected from different environments, we force to project the DSH feature maps from different domains to a new nominal domain, and a novel domain similarity loss is proposed based on one-class classification. In addition, a novel unsupervised person Re-ID loss is proposed to take full use of unlabeled target data. Extensive experiments on the Market-1501 and DukeMTMC-reID benchmarks demonstrate state-of-the-art performance of the proposed method. Code will be released to facilitate further studies on the cross-domain person re-identification task

arXiv.org e-Print Archive

The University of Nebraska, Omaha

CMTR: Cross-modality Transformer for Visible-infrared Person Re-identification

Author: Feng Songhe
Gao Yajun
Jin Yi
Li Yidong
Liang Tengfei
Liu Wu
Wang Tao
Publication venue
Publication date: 17/10/2021
Field of study

Visible-infrared cross-modality person re-identification is a challenging ReID task, which aims to retrieve and match the same identity's images between the heterogeneous visible and infrared modalities. Thus, the core of this task is to bridge the huge gap between these two modalities. The existing convolutional neural network-based methods mainly face the problem of insufficient perception of modalities' information, and can not learn good discriminative modality-invariant embeddings for identities, which limits their performance. To solve these problems, we propose a cross-modality transformer-based method (CMTR) for the visible-infrared person re-identification task, which can explicitly mine the information of each modality and generate better discriminative features based on it. Specifically, to capture modalities' characteristics, we design the novel modality embeddings, which are fused with token embeddings to encode modalities' information. Furthermore, to enhance representation of modality embeddings and adjust matching embeddings' distribution, we propose a modality-aware enhancement loss based on the learned modalities' information, reducing intra-class distance and enlarging inter-class distance. To our knowledge, this is the first work of applying transformer network to the cross-modality re-identification task. We implement extensive experiments on the public SYSU-MM01 and RegDB datasets, and our proposed CMTR model's performance significantly surpasses existing outstanding CNN-based methods.Comment: 11 pages, 7 figures, 7 table

arXiv.org e-Print Archive

Learning to Rank Image Tags With Limited Training Examples

Author: Rong Jin
Songhe Feng
Zheyun Feng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Subgraph and object context‐masked network for scene graph generation

Author: Gaoyun An
Songhe Feng
Zhendong Li
Zhenxing Zheng
Publication venue: Wiley
Publication date: 01/10/2020
Field of study

Scene graph generation is to recognise objects and their semantic relationships in an image and can help computers understand visual scene. To improve relationship prediction, geometry information is essential and usually incorporated into relationship features. Existing methods use coordinates of objects to encode their spatial layout. However, in this way, they neglect the context of objects. In this study, to take full use of spatial knowledge efficiently, the authors propose a novel subgraph and object context‐masked network (SOCNet) consisting of spatial mask relation inference (SMRI) and hierarchical message passing (HMP) modules to address the scene graph generation task. In particular, to take advantage of spatial knowledge, SMRI masks partial context of object features depending on their spatial layout of objects and corresponding subgraph to facilitate their relationship recognition. To refine the features of objects and subgraphs, they also propose HMP that passes highly correlated messages from both microcosmic and macroscopic aspects through a triple‐path structure including subgraph–subgraph, object–object, and subgraph–object paths. Finally, statistical co‐occurrence probability is used to regularise relationship prediction. SOCNet integrates HMP and SMRI into a unified network, and comprehensive experiments on visual relationship detection and visual genome datasets indicate that SOCNet outperforms several state‐of‐the‐art methods on two common tasks

Crossref

Directory of Open Access Journals